24 research outputs found

    Simulating the behavior of the human brain on GPUS

    Get PDF
    The simulation of the behavior of the Human Brain is one of the most important challenges in computing today. The main problem consists of finding efficient ways to manipulate and compute the huge volume of data that this kind of simulations need, using the current technology. In this sense, this work is focused on one of the main steps of such simulation, which consists of computing the Voltage on neurons’ morphology. This is carried out using the Hines Algorithm and, although this algorithm is the optimum method in terms of number of operations, it is in need of non-trivial modifications to be efficiently parallelized on GPUs. We proposed several optimizations to accelerate this algorithm on GPU-based architectures, exploring the limitations of both, method and architecture, to be able to solve efficiently a high number of Hines systems (neurons). Each of the optimizations are deeply analyzed and described. Two different approaches are studied, one for mono-morphology simulations (batch of neurons with the same shape) and one for multi-morphology simulations (batch of neurons where every neuron has a different shape). In mono-morphology simulations we obtain a good performance using just a single kernel to compute all the neurons. However this turns out to be inefficient on multi-morphology simulations. Unlike the previous scenario, in multi-morphology simulations a much more complex implementation is necessary to obtain a good performance. In this case, we must execute more than one single GPU kernel. In every execution (kernel call) one specific part of the batch of the neurons is solved. These parts can be seen as multiple and independent tridiagonal systems. Although the present paper is focused on the simulation of the behavior of the Human Brain, some of these techniques, in particular those related to the solving of tridiagonal systems, can be also used for multiple oil and gas simulations. Our studies have proven that the optimizations proposed in the present work can achieve high performance on those computations with a high number of neurons, being our GPU implementations about 4× and 8× faster than the OpenMP multicore implementation (16 cores), using one and two NVIDIA K80 GPUs respectively. Also, it is important to highlight that these optimizations can continue scaling, even when dealing with a very high number of neurons.This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 720270 (HBP SGA1), from the Spanish Ministry of Economy and Competitiveness under the project Computación de Altas Prestaciones VII (TIN2015-65316-P), the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d’Execució Parallels (2014-SGR-1051). We thank the support of NVIDIA through the BSC/UPC NVIDIA GPU Center of Excellence, and the European Union’s Horizon 2020 Research and Innovation Program under the Marie Sklodowska-Curie Grant Agreement No. 749516.Peer ReviewedPostprint (published version

    Semantic resource allocation with historical data based predictions

    Get PDF
    One of the most important issues for Service Providers in Cloud Computing is delivering a good quality of service. This is achieved by means of the adaptation to a changing environment where different failures can occur during the execution of different services and tasks. Some of these failures can be predicted taking into account the information obtained from previous executions. The results of these predictions will help the schedulers to improve the allocation of resources to the different tasks. In this paper, we present a framework which uses semantically enhanced historical data for predicting the behavior of tasks and resources in the system, and allocating the resources according to these predictions

    TANGO: Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation

    Get PDF
    The paper is concerned with the issue of how software systems actually use Heterogeneous Parallel Architectures (HPAs), with the goal of optimizing power consumption on these resources. It argues the need for novel methods and tools to support software developers aiming to optimise power consumption resulting from designing, developing, deploying and running software on HPAs, while maintaining other quality aspects of software to adequate and agreed levels. To do so, a reference architecture to support energy efficiency at application construction, deployment, and operation is discussed, as well as its implementation and evaluation plans.Comment: Part of the Program Transformation for Programmability in Heterogeneous Architectures (PROHA) workshop, Barcelona, Spain, 12th March 2016, 7 pages, LaTeX, 3 PNG figure

    MPI+OpenMP tasking scalability for the simulation of the human brain

    Get PDF
    The simulation of the behavior of the Human Brain is one of the most ambitious challenges today with a non-end of important applications. We can find many different initiatives in the USA, Europe and Japan which attempt to achieve such a challenging target. In this work we focus on the most important European initiative (Human Brain Project) and on one of the tools (Arbor). This tool simulates the spikes triggered in a neuronal network by computing the voltage capacitance on the neurons' morphology, being one of the most precise simulators today. In the present work, we have evaluated the use of MPI+OpenMP tasking on top of the Arbor simulator. In this paper, we present the main characteristics of the Arbor tool and how these can be efficiently managed by using MPI+OpenMP tasking. We prove that this approach is able to achieve a good scaling even when computing a relatively low workload (number of neurons) per node using up to 32 nodes. Our target consists of achieving not only a highly scalable implementation based on MPI, but also to develop a tool with a high degree of abstraction without losing control and performance by using MPI+OpenMP tasking.We would like to apreciate the valuable feedback and help provided by Benjamin Cumming and Alexander Peyser. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 720270 (HBP SGA1 and HBP SGA2), from the Spanish Ministry of Economy and Competitiveness under the project Computacion de Altas Prestaciones VII (TIN2015- ´ 65316-P) and the Departament d’Innovacio, Universitats i ´ Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programacio i Entorns d’Execuci ´ o Paral ´ ·lels (2014-SGR-1051). This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska Curie grand agreement No.749516Peer ReviewedPostprint (author version

    cuHinesBatch: solving multiple hines systems on GPUs Human Brain Project

    Get PDF
    The simulation of the behavior of the Human Brain is one of the most important challenges today in computing. The main problem consists of finding efficient ways to manipulate and compute the huge volume of data that this kind of simulations need, using the current technology. In this sense, this work is focused on one of the main steps of such simulation, which consists of computing the Voltage on neurons’ morphology. This is carried out using the Hines Algorithm. Although this algorithm is the optimum method in terms of number of operations, it is in need of non-trivial modifications to be efficiently parallelized on NVIDIA GPUs. We proposed several optimizations to accelerate this algorithm on GPU-based architectures, exploring the limitations of both, method and architecture, to be able to solve efficiently a high number of Hines systems (neurons). Each of the optimizations are deeply analyzed and described. To evaluate the impact of the optimizations on real inputs, we have used 6 different morphologies in terms of size and branches. Our studies have proven that the optimizations proposed in the present work can achieve a high performance on those computations with a high number of neurons, being our GPU implementations about 4× and 8× faster than the OpenMP multicore implementation (16 cores), using one and two K80 NVIDIA GPUs respectively. Also, it is important to highlight that these optimizations can continue scaling even when dealing with number of neurons.This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 720270 (HBP SGA1), from the Spanish Ministry of Economy and Competitiveness under the project Computación de Altas Prestaciones VII (TIN2015-65316-P) and the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d’Execució Paral·lels (2014-SGR-1051). We thank the support of NVIDIA through the BSC/UPC NVIDIA GPU Center of Excellence. Antonio J. Peña is cofinanced by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva fellowship number IJCI-2015-23266.Peer ReviewedPostprint (published version

    Plasmonic Response of Nested Nanoparticles with Arbitrary Geometry

    No full text
    The plasmonic response of nanoshells with different geometries is calculated using effective medium theory (EMT). The cases of multishell nanospheres, spheroids, and nested nanocubes are investigated. The model is extended to include radiation and nonspherical geometries to study confocal and nonconfocal ellipsoidal nanoshells as well as the combination of nested nanospheres with nanospheroids or nanocubes. As compared with more computer-intensive methods, such as finite difference time domain method (FDTD) or discrete dipole approximation (DDA), the modified EMT gives fast and accurate results. Additionally, the model is conceptually simple allowing a direct physical interpretation of the results. This methodology is useful for experimentalists who need fast and reliable predictions of the plasmonic behavior of complex nanoparticles

    Service orchestration on a heterogeneous cloud federation

    No full text
    During the last years, the cloud computing technology has emerged as a new way to obtain computing resources on demand in a very dynamic fashion and only paying for what you consume. Nowadays, there are several hosting providers which follow this approach, offering resources with different capabilities, prices and SLAs. Therefore, depending on the users' preferences and the application requirements, a resource provider can fit better with them than another one. In this paper, we present an architecture for federating clouds, aggregating resources from different providers, deciding which resources and providers are the best for the users' interests, and coordinating the application deployment in the selected resources giving to the user the impression that a single cloud is used.This work is supported by the Ministry of Science and Technology of Spain and the European Union under contract TIN2007-60625 (FEDER funds), the Ministry of Industry of Spain under contract TSI-020301.1009.3 (Avanza NUBA project) and Generalitat de Catalunya under contract 2009-SGR-980Peer Reviewe

    Monitoring and steering Grid applications with GRID superscalar

    No full text
    We present the design and implementation of a general task monitoring and steering system for Grid applications (GSTAT). The system is integrated in the GRID superscalar (GRIDSs) programming framework. Information at the application, Grid node, and individual task levels are supplied upon request. Using the steering capabilities, individual tasks or the whole application can be cancelled. The corresponding jobs can be restarted using fault tolerance and checkpointing capabilities based on GRIDSs. In addition, the computational resources assigned to an application can be modified. GSTAT is tested using high throughput and high performance computing cases on an Internet-based Grid of computers. © 2009 Elsevier B.V. All rights reserved.This work has been cofinanced by FEDER funds and the Consejería de Educación y Ciencia de la Junta de Comunidades de Castilla-La Mancha (grant # PBI08-0008). The Ministerio de Ciencia y Tecnología (grant # FIS2005-00293; AYA 2008-00446) and the Universidad de Castilla-La Mancha are also acknowledged. The authors wish to thank the Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS), the Facultad de Ciencias Químicas of the Universidad Autónoma de Puebla (Mexico) and the DAMIR group of the Consejo Superior de Investigaciones Científicas (CSIC, Spain) for allowing the use of their systems.Peer Reviewe
    corecore